Some questions the insights of these datasets:
Question 1: Which is the most favourite name? Change in Name over time, relationship between name with rating_numerator, favorite_count, reteet_count?
Question 2: Stage and its relationship with tweet number, favorite_count and retweet
Question 3: favourite breed? Its relationship with rating_numerator, favorite_count, retweet_count
Question 4: Relationship between Favorite_Count, Retweet_Count and rating_numerator
Question 5: Which time are tweets posted or retweeted, or marked favorite?
We have 3 datasets twitter-archive-enhanced.csv, image_predictions.tsv and tweet_json.txt with 3 different file formats (csv, tsv, json.txt) to read
twitter-archive-enhanced.csvimage_predictions.tsvtwitter_api.py from UdacityAt this part we will perfrom all phase from garthering data, assessing data and cleaning data. The wrangled data are used to visualize and analyze in order to get the insights of datas.
The next part, some of interesting conclusion are given and therefore provides us a better view about the data. The data can tell a lot of thing, not only about the dogs being posted on Twitter and which kind of dogs are most liked. It also reveals some predictions about the breed of dogs and their stage as well. So from image posting on Twitter we can gain informations about the dogs.
But that is not so. We can gain also alot of information about the user who post on Twitter such as, which way they used to tweet, to retweet or like images. The data also reveals information about when they were on Twitter. At this project in short time, only some information can be explored. When there are more time, perhaps we can find out more interesting things not only about the dogs.
After cleaning the data and storing it in
twitter_archive_master.csvfile. We do some small analyze about the dog. The below image presents the most common name of dogs which are posted on twitter. With the chart following, some information along with the name are also included
Conclusion 1:
With the charts we can answer the question that Charlie is the most poplular name of dogs posted on twitter. Following are the name Cooper ad Oliver. However, along with the name it seems that no clue for relationship betweet name of dogs with the number of
favourite_countand the number of retweetretweet_count.Question 2: Stage and its relationship with tweet number, favorite_count and retweet¶
Conclusion 2:
Pupper is the most favourite stage being posted on Twitter with ca. 66%. Followings are doggo and puppo. Interesstingly is that although pupper are posted mostly on Twitter, its
avg.ratingis the lowest and thefavourite_countandretweet_coutas well. In contrast withdoggo, pupposuch kind of dogs have the highest amount of rating as wellfavourite_countandretweet_countno matter when they stay alone or are grouped together. Unfortunately, the picture about them is very few.Question 3: Favourite breed? Its relationship with rating_numerator, favorite_count, retweet_count¶
Conclusion 3:
The chart shows Golden Retriever as the most posted breed on Twitter. However, this kind of dog gains not much interrests from people due to its low
favourite_countandretweet_count. The same are Labrador and Pembroke. The ones gain the most attention are Toy Poodle and Miniature Pinscher dogs respectively. Out of them, Chihuahua is also a most-liked dog with the number of tweets,favourite_countandretweet_countat the top 3.Question 4: Relationship between Favorite_Count, Retweet_Count &Rating_numerator¶
Conclusion 4:
These charts shows a good clue to see that there are tight relationship between
favourite_count,retweet_countandrating_numerator. Und such of these pair relationship are positive and quite strong, particularly the relationship betweetfavourtite_countandretweet_count.Question 5: Which time are tweets posted or retweeted, or marked favorite?¶
Conclusion 5:
The data shows that people tweet at most in Januar and Febuary. From Octorber until December are the months recording the lowest avg.number of tweets. However they retweet top at Juni when the number of tweet is lowest.
Another interresting thing is that in 1 week, instead of weekend, people tend to tweet on Tuesday and then the number of retweet and favourite_count reachs peak at Wednesday
Meanwhile, people tend to tweet on Twitter at the night and in the afternoon, particular at 1.00 o'clock at night and at 4.pm. From 6.00 o'clock to 14.00 o'clock, there are no count of tweet to be reported, but the favorite_ count and retweet_count are still documented. In contrast, the chart shows at 6 a.m, the number of favourite_count and retweet are dramatically increased, then gradually decreased untill 19.00, then increased again.
The data gives us a very interesting view about the dogs and the way people using Twitter. At this project, we have a chance to collect data from website, the do some challenging data cleaning, and make some interesting analysis is. Fortunately, some question we want to know, we can find it from the data after wrangling.
Of course, there will be much interesting information still hiden. But along with time we can explore them in an effectiver way.